library(limma)
library(edgeR)
library(tximport)
library(AnnotationHub)
library(magrittr)
library(scales)
library(tidyverse)
library(pander)
theme_set(theme_bw())
The alignments in this analysis were generated by aligning each library (including technical replicates) to the Zebrafish transcriptome from Ensembl Release 94 (GRCz11)) using kallisto (v0.43.1). In addition to the standard transcriptome, the two mutant psen2 transcripts were manually added to the reference.
The corresponding set of gene descriptions were then loaded into R as an EnsDb object using the AnnotationHub() infrastructure. Likewise, the set of transcript descriptions were loaded, with the manual addition of the two novel psen2 mutants.
ah <- AnnotationHub() %>%
subset(species == "Danio rerio") %>%
subset(rdataclass == "EnsDb")
ensDb <- ah[["AH64906"]]
genes <- genes(ensDb)
mcols(genes) <- mcols(genes)[!names(mcols(genes)) %in% c("seq_coord_system", "gene_name")]
transcripts <- transcripts(ensDb)
psen2 <- subset(transcripts, gene_id == "ENSDARG00000015540") %>%
granges() %>%
magrittr::extract(c(1, 1),)
names(psen2) <- c("psen2T141_L142delinsMISLISV", "psen2N140fs")
mcols(psen2) <- DataFrame(tx_id = names(psen2),
tx_biotype = c("protein_coding", "nonsense_mediated_decay"),
gene_id = "ENSDARG00000015540",
tx_id_version = names(psen2),
tx_name = names(psen2))
transcripts %<>%
GRangesList(psen2) %>%
unlist() %>%
sort()
transcripts$symbol <- mcols(genes[transcripts$gene_id])$symbol
tx2gene <- mcols(transcripts)[c("tx_id_version", "gene_id")]
geneCounts <- list.files("../3_kallisto/", recursive = TRUE, pattern = "h5", full.names = TRUE) %>%
set_names(basename(dirname(.))) %>%
set_names(str_replace(names(.), "_-", "WT")) %>%
tximport(type = "kallisto", tx2gene = tx2gene)
Gene-level counts were imprted using tximport, mapping transcripts to genes.
minCpm <- 1
minSamples <- 5
dge <- geneCounts$counts %>%
magrittr::extract(rowSums(cpm(.) > minCpm) >= minSamples,) %>%
DGEList(genes = genes[rownames(.)]) %>%
calcNormFactors()
dge$samples %<>%
mutate(Sample = rownames(.),
Genotype = str_extract(Sample, "(WT|FAD|FS)"),
Genotype = factor(Genotype, levels = c("WT", "FS", "FAD")),
group = as.integer(Genotype)) %>%
set_rownames(.$Sample)
Genes were retained for analysis if a CPM > 1 was observed for \(\geq\) 5 samples. This equated to about 31 reads for a gene in at least 5 samples for inclusion in downstream analysis, giving a total of 20,743 of the original genes for DGE analysis.
Total counts from each library after assigning to genes
geneVoom <- dge %>%
voomWithQualityWeights(design = matrix(rep(1, ncol(.)), ncol = 1))
Counts were also processed using the voom transformation using quality weights to allow for analysis using normal-based algorithms. Sample weights ranged between Quitting from lines 110-112 (DGE.Rmd) Error in pander(round(range(v\(targets\)sample.weights), 4)) : object ‘v’ not found Calls:
Transcript-level counts were imported using catchKallisto() from edgeR in order to utilise the voom transformation on transcript-level counts.
transCounts <- list.files("../3_kallisto/", full.names = TRUE) %>%
catchKallisto()
## Reading ../3_kallisto//1_FAD_5, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//10_FS_1, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//11__-_5, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//12__-_4, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//13__-_3, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//14__-_2, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//15__-_1, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//2_FAD_4, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//3_FAD_3, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//4_FAD_2, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//5_FS_2, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//6_FAD_1, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//7_FS_5, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//8_FS_4, 57777 transcripts, 20 bootstraps
## Reading ../3_kallisto//9_FS_3, 57777 transcripts, 20 bootstraps
colnames(transCounts$counts) %<>%
basename() %>%
str_replace("_-", "WT")
transVoom <- transCounts$counts %>%
magrittr::extract(rowSums(cpm(.) > minCpm) >= minSamples,) %>%
divide_by(transCounts$annotation[rownames(.),"Overdispersion"]) %>%
set_rownames(str_remove(rownames(.), "\\.[0-9]+")) %>%
DGEList(samples = dge$samples[colnames(.),],
genes = transcripts[rownames(.)]) %>%
calcNormFactors() %>%
voomWithQualityWeights(plot = TRUE, design = matrix(1, nrow = ncol(.), ncol = 1))
Sample weights using transcript-level counts, showing near identical patterns to those observed at the gene-level.
psen2IDs <- subset(transcripts, symbol == "psen2") %>% names()
transCounts$counts %>%
cpm() %>%
set_rownames(str_remove(rownames(.), "\\.[0-9]+")) %>%
magrittr::extract(psen2IDs,) %>%
as.data.frame() %>%
rownames_to_column("Transcript") %>%
gather(key = "Sample", value = "CPM", -Transcript) %>%
mutate(Transcript = str_replace(Transcript, "ENSDART.+", "psen2-WT")) %>%
left_join(transVoom$targets) %>%
ggplot(aes(Sample, CPM, fill = Transcript)) +
geom_bar(stat = "identity") +
facet_wrap(~Genotype, scales = "free_x") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Transcript abundances (using CPM) were calculated for each of the three psen2 transcripts, and showed expected patterns of heterozygous expression for FAD samples and all WT expression for the WT samples. However for sample 8_FS_4, no WT allele was detected and this sample should be excluded from all analyses. The remaining FS samples showed reduced abundance of the FS transcript, as expected under NMD. No increases in expression of the WT allele were evident, supporting a lack of genetic compensation.
This sample was then removed from all objects, along with 12_WT_4 which was consistently down-weighted.
transVoom <- transVoom[, !colnames(transVoom) %in% c("8_FS_4", "12_WT_4")]
geneVoom <- geneVoom[, !colnames(geneVoom) %in% c("8_FS_4", "12_WT_4")]
dge <- dge[,!colnames(dge) %in% c("8_FS_4", "12_WT_4")]
The first step for the analysis was to perform an MDS analysis. However, minimal separation was observed between sample groups, with the exception of the most strongly down-weighted sample, which separated very clearly from the remainder of the points. This sample may be a candidate for exclusion if no differentially expressed genes are able to be detected. A simple PCA also revealed that the first few principal components only capture a lesser amount of the total variability than may be expected.
mds <- geneVoom %>%
plotMDS(plot = FALSE) %>%
extract2("cmdscale.out") %>%
set_colnames(paste0("Dim", 1:2))
MDS plot showing no clear groups within the data. Point sizes indicate sample weights as calculated by voomWithQualityWeights().
| PC1 | PC2 | PC3 | PC4 | PC5 | |
|---|---|---|---|---|---|
| Standard deviation | 28.94 | 25.91 | 24.47 | 22.66 | 21.04 |
| Proportion of Variance | 0.1593 | 0.1277 | 0.1139 | 0.0977 | 0.08418 |
| Cumulative Proportion | 0.1593 | 0.287 | 0.4009 | 0.4986 | 0.5828 |
mm <- model.matrix(~0 + Genotype, data = geneVoom$targets) %>%
set_colnames(gsub("Genotype", "", colnames(.)))
cm <- makeContrasts(FSvWT = FS - WT,
FADvWT = FAD - WT,
FADvFS = FAD - FS,
levels = colnames(mm))
Three comparisons were defined with the first two being the difference between the two mutant families and the wild-type samples. The third comparison was defined as being between the two mutant groups.
geneFit <- geneVoom %>%
lmFit(mm) %>%
contrasts.fit(cm) %>%
eBayes()
colnames(cm) %>%
sapply(function(x){
topTable(geneFit, x, sort.by = "p") %>%
mutate(Comparison = x)
}, simplify = FALSE)
transFit <- transVoom %>%
lmFit(mm) %>%
contrasts.fit(cm) %>%
eBayes()
colnames(cm) %>%
sapply(function(x){
topTable(transFit, x, sort.by = "p") %>%
mutate(Comparison = x)
}, simplify = FALSE)